In 2023, a team of researchers realised a social survey for the Umbrian association una Regione per Restare - RxR. The research focused mainly on two questions:
What compels so many people to leave their region?
Why do some still decide to stay?
As this was the first research project promoted by the association, the sample is patchy and excessively small. Since the survey was administered through snow-balling, there is also an evident self-selection problem (too many students, too many individuals aged 20-25 due to the survey being distributed mainly through universities). These problems have been discussed at length internally and will all be taken into account in future projects. In the meantime, the data was used to conduct a merely descriptive analysis, in order to obtain at list a few hints on the next steps to be taken. A good portion of this exact work was carried out in order to explain the issues with the data collection process itself to a non-technical audience.
About a year after the beginning of the survey’s distribution, I was involved as a data analyst, and contributed to the writing of the final report. Below is a selection of my contributions, both in terms of internal and external communication.
Since the whole work was written in Italian, the graphs and charts are not translated. The source code and data is available at the link on the top right, which leads to Lucia Temperini and I’s GitHub repository. Although code boxes were included in the post for reference, the final graphics were post-produced in Pixelmator. A selection of these are included in the gallery right below, before the actual post.
## Domicilio - cleaningAbitare$dom <-as.character(Abitare$dom)Abitare$dom <-ifelse(Abitare$dom =='Stesso della residenza', Abitare$res, Abitare$dom)## Provinceprov <- geoCod |>select(`Denominazione (Italiana e straniera)`, `Unità territoriale sovracomunale`) |>rename(denom =`Denominazione (Italiana e straniera)`,prov =`Unità territoriale sovracomunale`)Abitare <-left_join(Abitare, prov, by =join_by(dom == denom))Abitare <- Abitare |>relocate(prov, .after = dom) |>rename(prov_dom = prov)## Select and merge quest datadf <-left_join(Lavorare |>select(id, occ, gen), Abitare |>select(id, dom, eta))### factor etàdf$eta <-as.numeric(df$eta)df <- df |>mutate(class_eta =case_match(eta,c(15:19) ~'15-20',c(20:24) ~'20-25',c(25:29) ~'25-30',c(30:34) ~'30-35',c(35:39) ~'35-40',c(40:45) ~'40-45' ))### occ cleaningdf <- df |>mutate(occ =gsub((' (inclusi contratti a nero, precari, di ricerca, stage, servizio civile)'), '', occ, fixed = T),occ =gsub((' (inclusi contratti a nero, precari, di ricerca, stage, servizio civile)'), '', occ, fixed = T))## dom frequency tabledomdf <- df |>group_by(dom) |>count()### geospatial data mergedomdf <-left_join(domdf, sf |>select(COMUNE, geometry),by =join_by(dom == COMUNE))
The main problem with the questionnaire was its coverage. In discussing the problem with the rest of our organisation, mapping it out was the best way to explain our dissatisfaction with the results. The tooltip reveals the number of responses in each municipality. The extremely low numbers would be a problem in and of themselves, but the extreme concentration in the city of Perugia (where the University is located) renders the sample impossible to use for statistical inference.
The gray areas are municipalities (comuni) that we could not reach at all.
The extremely unequal socio-demographical composition of the sample did not reassure us in any way. The choice of snow-balling as a distribution method introduced significant self-selection problems, which lead to the situation shown in the figure
The following section tries to answer the main question directly: what brings people to leave their home? Why do some still decide to stay?
Relationship to Umbria
The first step of the analysis was purely descriptive: how big is the fraction of our sample which left, or would like to? An infographic-style waffle chart was the best choice to convey meaning keeping sense of scale.
Show code
## Rapporto con la regione -----------------------------------------------------Restare |>group_by(rapp) |>count() |>mutate(rapp =case_match(rapp,'Vorrei restare nel posto in cui vivo'~'I\'d like to stay where I am','Sarei contento di vivere e lavorare altrove'~'I\'d like to live and work somewhere else','Vorrei restare ma non posso'~'I\'d like to stay, but I can\'t','Vorrei partire ma non posso'~'I\'d like to leave, but I can\'t'),rapp =factor(rapp, levels =c('I\'d like to stay where I am','I\'d like to stay, but I can\'t','I\'d like to leave, but I can\'t','I\'d like to live and work somewhere else' ))) |>waffle(size =1,flip = T,reverse = T,legend_pos ='right')
Reasons for staying
This last section aimed at finding the exact reasons why people either stay or go. Radar plots were the best way to convey the relative importance of each individual reason.
Show code
## Motivi per restare ----------------------------------------------------------### graphics df#### counting each columnrest <-tibble(.rows =4, choice =c('Per nulla','Poco','Abbastanza','Molto'))for (i in1:10) {vec <- Restare[startsWith(names(Restare),"rest")] |>group_by(Restare[startsWith(names(Restare),"rest")][i]) |>drop_na() |>count(name =paste(names(Restare[startsWith(names(Restare),"rest")][i]), '_n')) |>rename('choice'=names(Restare[startsWith(names(Restare),"rest")][i]))rest <-full_join(rest, vec |>mutate(choice = choice))}rm(i)#### data wranglingrest2 <-data.frame(t(rest[-1])) # swapping columns-rowscolnames(rest2) <- rest$choicerest <-rownames_to_column(rest2) |>mutate(rowname =gsub('_n', '', rowname)) |># column rename(choice = rowname)rm(rest2)rest <- rest |>mutate(index =round(((Abbastanza + Molto)/89)*100, 2)) # % di abbastanza + molto importante### graphicslabels<-data.frame(y =c(25,50,75,100),x =rep(0.25,4))rest |>filter(choice !='restFort ') |>mutate(choice =case_match(choice,'restLeg '~'Legame/impegno per la comunità','restSoc '~'Contatti sociali e umani più gratificanti','restNat '~'Contatto con la natura','restQual '~'Qualità e stile di vita','restOpp '~'Opportunità anche nel restare','restImp '~'Idea imprenditoriale','restFam '~'Esigenze personali/familiari','restCost '~'Costo della vita più basso','restAmb '~'Scarsa importanza alla carriera' )) |>ggplot(aes(x = choice, y = index, fill = choice)) +geom_col() +coord_polar() +scale_y_continuous(limits =c(0, 85)) +labs(title ='Motivi per restare') +scale_fill_manual(values =met.brewer('Tiepolo', 9)) +theme_void() +theme(axis.title =element_blank(),legend.position ='right',legend.title =element_blank(),plot.title =element_text(size =20, hjust = .5))## Motivi per lasciare ---------------------------------------------------------### graphics df#### counting each columnlasc <-tibble(.rows =4, choice =c('Abbastanza', 'Molto', 'Per nulla', 'Poco'))for (i in1:10) { vec <- Restare[startsWith(names(Restare),"lasc")] |>group_by(Restare[startsWith(names(Restare),"lasc")][i]) |>drop_na() |>count(name =paste(names(Restare[startsWith(names(Restare),"lasc")][i]), '_n')) |>rename('choice'=names(Restare[startsWith(names(Restare),"lasc")][i])) lasc <-full_join(lasc, vec)}rm(i)#### data wranglinglasc2 <-data.frame(t(lasc[-1])) # swapping columns-rowscolnames(lasc2) <- lasc$choicelasc <-rownames_to_column(lasc2) |>mutate(rowname =gsub('_n', '', rowname)) |># column rename(choice = rowname)rm(lasc2)lasc <- lasc |>mutate(index =round(((Abbastanza + Molto)/166)*100, 2)) # % di abbastanza + molto importante### graphicslabels<-data.frame(y =c(25,50,75,100),x =rep(0.25,4))lasc |>mutate(choice =case_match(choice,'lascEsp '~'Ampliare i propri orizzonti','lascOpp '~'Formazione/offerte di lavoro','lascImp '~'Idea imprenditoriale','lascEst '~'Bellezza estetica delle città','lascSoc '~'Relazioni sociali','lascFam '~'Realizzazione familiare','lascServ '~'Offerta di servizi','lascRit '~'Realizzarsi per poi tornare','lascTent '~'Tentare a realizzarsi','lascCult '~'Vita culturale più intensa' )) |>ggplot(aes(x = choice, y = index, fill = choice)) +geom_col() +coord_polar() +scale_y_continuous(limits =c(0, 95)) +labs(title ='Motivi per andare') +scale_fill_manual(values =met.brewer('Tiepolo', 10)) +theme_void() +theme(axis.title =element_blank(),legend.position ='right',legend.title =element_blank(),plot.title =element_text(size =20, hjust = .5))